Exploratory Data Analysis (EDA)

Visualizing and Interpreting Job Market Trends

Authors
Affiliation

Connor Coulter

Boston University

Wei Wang

Boston University

Balqis Bevi Abdul Hannan Kanaga

Boston University

Loaded dataset: (72498, 131)
Derived non-null: {'INDUSTRY_DISPLAY': np.int64(72454), 'SALARY_DISPLAY': np.int64(72498)}
Remaining columns (first 30): ['LAST_UPDATED_DATE', 'POSTED', 'EXPIRED', 'DURATION', 'SOURCE_TYPES', 'SOURCES', 'URL', 'MODELED_EXPIRED', 'MODELED_DURATION', 'COMPANY', 'COMPANY_NAME', 'COMPANY_IS_STAFFING', 'EDUCATION_LEVELS', 'EDUCATION_LEVELS_NAME', 'MIN_EDULEVELS', 'MIN_EDULEVELS_NAME', 'MAX_EDULEVELS', 'MAX_EDULEVELS_NAME', 'EMPLOYMENT_TYPE', 'EMPLOYMENT_TYPE_NAME', 'MIN_YEARS_EXPERIENCE', 'MAX_YEARS_EXPERIENCE', 'IS_INTERNSHIP', 'SALARY', 'REMOTE_TYPE', 'REMOTE_TYPE_NAME', 'ORIGINAL_PAY_PERIOD', 'SALARY_TO', 'SALARY_FROM', 'LOCATION']
/tmp/ipykernel_5942/3965281858.py:37: FutureWarning:

A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


/tmp/ipykernel_5942/3965281858.py:75: FutureWarning:

A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.



/tmp/ipykernel_5942/3965281858.py:78: FutureWarning:

A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


Removed 3300 duplicates using ['TITLE', 'COMPANY_NAME', 'LOCATION', 'POSTED']
Final SALARY_DISPLAY Non-Null Count: 69198

1 Job Postings by Industry (Top 15)

1.1 Rationale

Highlights sectors where demand is concentrated, showing which industries are actively hiring.

1.2 Key Insights

  • Top Hiring Industries: Custom Computer Programming, Management Consulting, and Employment Agencies dominate job postings.
  • Skewed Distribution: The top 4 industries account for a significantly larger share of job postings than the rest.
  • Professional Services Focus: Many high-posting sectors are centered around tech, consulting, healthcare and education — reflecting demand for knowledge-based roles.

2 Salary Distribution by Industry (Top 15)

2.1 Rationale

Shows where negotiation power exists and highlights industries paying well.

2.2 Key Insights

  • Wide Salary Ranges in Staffing & Tech Services: Industries like Temporary Help Services and Employment Placement Agencies exhibit large salary spreads with high outliers, though their median pay remains modest.
  • Stable Pay in Professional Sectors: Most industries maintain a consistent median salary around $100K–$150K, reflecting standardized compensation and less variation in negotiation power.

3 Remote vs. On-Site Jobs

3.1 Rationale

Workplace flexibility is a major factor in today’s job market.

3.2 Key Insights

  • Limited Remote Availability: Only about 17% of job postings are labeled as Remote, with Hybrid Remote and Not Remote making up even smaller portions.
  • Data Gaps in Job Listings: A significant 78.3% of postings lack remote classification, indicating either incomplete employer data or inconsistent labeling, which may affect job seekers’ filtering and selection.